Enriching Wikipedia Vandalism Taxonomy via Subclass Discovery

نویسندگان

  • Si-Chi Chin
  • William Nick Street
چکیده

This paper adopts an unsupervised subclass discovery approach to automatically improve the taxonomy of Wikipedia vandalism. Wikipedia vandalism, defined as malicious editing intended to compromise the integrity of the content of articles, exhibits heterogeneous characteristics, making it hard to detect automatically. The categorization of vandalism provides insights on the detection of vandalism instances. Experimental results demonstrate the potential of using supervised and unsupervised learning to reproduce the manual annotation and enrich the predefined knowledge representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism de...

متن کامل

Spatio-Temporal Analysis of Reverted Wikipedia Edits

Little is known about what causes anti-social behavior online. The paper at hand analyzes vandalism and damage in Wikipedia with regard to the time it is conducted and the country it originates from. First, we identify vandalism and damaging edits via ex post facto evidence by mining Wikipedia’s revert graph. Second, we geolocate the cohort of edits from anonymous Wikipedia editors using their ...

متن کامل

Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals - Lab Report for PAN at CLEF 2010

Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and ar...

متن کامل

Detecting Vandalism on Wikipedia across Multiple Languages

Vandalism, the malicious modification or editing of articles, is a serious problem for free and open access online encyclopedias such as Wikipedia. Over the 13 year lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more than 500 million revisions of over 9 million English articles, but smaller manually inspected sets of revisions for research show vandalism may ap...

متن کامل

Using Language Models to Detect Wikipedia Vandalism

This paper explores a statistical language modeling approach for detecting Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism, defined as malicious editing intended to compromise the integrity of the content of articles. Ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011